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Power Aware Wireless File Downloading: A 
Lyapunov Indexing Approach to A Constrained 

Restless Bandit Problem 

Xiaohan Wei and Michael J. Neely 


Abstract —This paper treats power-aware throughput maxi¬ 
mization in a multi-user file downloading system. Each user can 
receive a new tile only after its previous file is finished. The 
file state processes for each user act as coupled Markov chains 
that form a generalized restless handit system. First, an optimal 
algorithm is derived for the case of one user. The algorithm 
maximizes throughput subject to an average power constraint. 
Next, the one-user algorithm is extended to a low complexity 
heuristic for the multi-user problem. The heuristic uses a simple 
online index policy. In a special case with no power-constraint, 
the multi-user heuristic is shown to be throughput optimal. 
Simulations are used to demonstrate effectiveness of the heuristic 
in the general case. For simple cases where the optimal solution 
can be computed offline, the heuristic is shown to be near-optimal 
for a wide range of parameters. 

I. Introduction 

Consider a wireless access point, such as a base station 
or femto node, that delivers files to N different wireless 
users. The system operates in slotted time with time slots 
t G {0,1,2,...}. Each user can download at most one file at 
a time. File sizes are random and complete delivery of a file 
requires a random number of time slots. A new file request 
is made by each user at a random time after it finishes its 
previous download. Let Fn(t) G {0,1} represent the binary 
Ji/e state process for user n G {1,..., N}. The state Fn{t) = 1 
means that user n is currently active downloading a file, while 
the state F„(f) = 0 means that user n is currently idle. 

Idle times are assumed to be independent and geometrically 
distributed with parameter A„ for each user n, so that the 
average idle time is 1/A„. Active times depend on the random 
file size and the transmission decisions that are made. Every 
slot t, the access point observes which users are active and 
decides to serve a subset of at most M users, where M is 
the maximum number of simultaneous transmissions allowed 
in the system (M < N is assumed throughout). The goal is 
to maximize a weighted sum of throughput subject to a total 
average power constraint. 

The file state processes Fn{t) are coupled controlled 
Markov chains that form a total state (Fi(f),..., Fjv(f)) 
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that can be viewed as a restless multi-armed bandit system. 
Such problems are complex due to the inherent curse of 
dimensionality. 

This paper first computes an online optimal algorithm for 
1-user systems, i.e., the case = 1. This simple case avoids 
the curse of dimensionality and provides valuable intuition. 
The optimal policy here is nontrivial and uses the theory 
of Lyapunov optimization for renewal systems Em. The 
resulting algorithm makes a greedy transmission decision that 
affects success probability and power usage. The decision is 
based on a drift-plus-penalty index. Next, the algorithm is 
extended as a low complexity online heuristic for the A-user 
problem. The heuristic has the following desirable properties: 

• Implementation of the A-user heuristic is as simple as 
comparing indices for A different 1-user problems. 

• The A-user heuristic is analytically shown to meet the 
desired average power constraint. 

• The A-user heuristic is shown in simulation to perform 
well over a wide range of parameters. Specifically, it is 
very close to optimal for example cases where an offline 
optimal can be computed. 

• The A-user heuristic is shown to be optimal in a special 
case with no power constraint and with certain addi¬ 
tional assumptions. The optimality proof uses a theory 
of stochastic coupling for queueing systems m. 

Prior work on wireless optimization uses Lyapunov func¬ 
tions to maximize throughput in cases where the users 
are assumed to have an infinite amount of data to send 
ll4lll5lll^ll7lll8]|||9llfT0l . or when data arrives according to a 
fixed rate process that does not depend on delays in the 
network (which necessitates dropping data if the arrival rate 
vector is outside of the capacity region) ail. These models 
do not consider the interplay between arrivals at the transport 
layer and file delivery at the network layer. For example, 
a web user in a coffee shop may want to evaluate the 
file she downloaded before initiating another download. The 
current paper captures this interplay through the binary file 
state processes F„(f). This creates a complex problem of 
coupled Markov chains. This problem is fundamental to file 
downloading systems. The modeling and analysis of these 
systems is a significant contribution of the current paper. 

To understand this issue, suppose the data arrival rate is 
fixed and does not adapt to the service received over the 
network. If this arrival rate exceeds network capacity by a 
factor of two, then at least half of all data must be dropped. 
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This can result in an unusable data stream, possibly one 
that contains every odd-numbered packet. A more practical 
model assumes that full files must be downloaded and that 
new downloads are only initiated when previous ones are 
completed. A general model in this direction would allow each 
user to download up to K files simultaneously. This paper 
considers the case K = 1, so that each user is either actively 
downloading a file, or is idle|^ The resulting system for N 
users has a nontrivial Markov structure with 2^ states. 

Markov decision problems (MDPs) can be solved offline via 
linear programming ca. This can be prohibitively complex 
for large dimensional problems. Low complexity solutions for 
coupled MDPs are possible in special cases when the coupling 
involves only time average constraints ifTSll . Finite horizon 
coupled MDPs are treated via integer programming in ifTSll 
and via a heuristic “task decomposition” method in lfT4l . The 
problem of the current paper does not fit the framework of 
na-iEi because it includes both time-average constraints 
(on average power expenditure) and instantaneous constraints 
which restrict the number of users that can be served on one 
slot. The latter service restriction is similar to a traditional 
restless multi-armed bandit (RMAB) system ca. 

RMAB problems consider a population of N parallel 
MDPs that continue evolving whether in operation or not 
(although under different rules). The goal is to choose 
the MDPs in operation during each time slot so as to 
maximize the expected reward subject to a constraint 
on the number of MDPs in operation. The problem is in 
general complex (see P-SPACE hardness results in ITfiJ). A 
standard low-complexity heuristic for such problems is the 
Whittle’s index technique ifTSl . However, the Whittle’s index 
framework applies only when there are two options on 
each state (active and passive). Further, it does not consider 
the additional time average cost constraints. The Lyapunov 
indexing algorithm developed in the current paper can 
be viewed as an alternative indexing scheme that can 
always be implemented and that incorporates additional 
time average constraints. It is likely that the techniques of 
the current paper can be extended to other constrained RMAB 
problems. Prior work in Q develops a Lyapunov drift method 
for queue stability, and work in Gl develops a drift-plus- 
penalty ratio method for optimization over renewal systems. 
The current work is the first to use these techniques as a low 
complexity heuristic for multidimensional Markov problems. 

Work in H uses the theory of stochastic coupling to show 
that a longest connected queue algorithm is delay optimal in 
a multi-dimensional queueing system with special symmetric 
assumptions. The problem in Q is different from that of the 
current paper. However, a similar coupling approach is used 
in Section IV to show that, for a special case with no power 
constraint, the Lyapunov indexing algorithm is throughput 
optimal in certain asymmetric cases. As a consequence, the 
proof shows the policy is also optimal for a different setting 


with M servers, N single-buffer queues, and arbitrary packet 
amval rates (Ai,... ,Xn)- 

11. Single user scenario 

Consider a file downloading system that consists of only one 
user that repeatedly downloads files. Let F{t) G {0,1} be the 
file state process of the user. State “1” means there is a file in 
the system that has not completed its download, and “0” means 
no file is waiting. The length of each file is independent and 
is either exponentially distributed or geometrically distributed 
(described in more detail below). Let B denote the expected 
file size in bits. Time is slotted. At each slot in which there 
is an active file for downloading, the user makes a service 
decision that affects both the downloading success probability 
and the power expenditure. After a file is downloaded, the 
system goes idle (state 0) and remains in the idle state for a 
random amount of time that is independent and geometrically 
distributed with parameter A > 0. 

A transmission decision is made on each slot t in which 
F{t) = 1. The decision affects the number of bits that are sent, 
the probability these bits are successfully received, and the 
power usage. Let a{t) denote the decision variable at slot t and 
let A represent an abstract action set with a finite number of 
elements. The set A can represent a collection of modulation 
and coding options for each transmission. Assume also that 
A contains an idle action denoted as “0.” The decision a{t) 
determines the following two values: 

• The probability of successfully downloading a file 
(l){a(t)), where 4>{-) G [0,1] with (/)(0) = 0. 

• The power expenditure p{a{t)), where p(-) is a nonneg¬ 
ative function with p(0) = 0. 

The user chooses a{t) = 0 whenever F{t) = 0. The user 
chooses a{t) G A for each slot t in which F{t) = 1, with 
the goal of maximizing throughput subject to a time average 
power constraint. 

The problem can be described by a two state Markov 
decision process with binary state F{t). Given Fit) = 1, a file 
is currently in the system. This file will finish its download 
at the end of the slot with probability (j){a{t)). Hence, the 
transition probabilities out of state 1 are: 

Pr[F(f + l)=0|F(f) = l] = </.(a(f)) (1) 

Pr[F{t+l) = l\Fit) = l] = l-(/)(a(f)) (2) 

Given F{t) = 0, the system is idle and will transition to the 
active state in the next slot with probability A, so that: 

Pr[Fit + l) = l\F{t) = 0] = A (3) 

Pr[F{t -f 1) = 0|P(f) = 0] = 1 - A (4) 


*One way to allow a user n to download up to K files simultaneously is 
as follows: Define K virtual users with sepai'ate binary file state processes. 
The transition probability from idle to active in each of these virtual users is 
XnjK. The conditional rate of total new arrivals for user n (given that m 
files are cun'ently in progress) is then An(l —m/A') for m £ {0,1,..., M}. 


Define the throughput, measured by bits per slot, as 

E 
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The file downloading problem reduces to the following; 


T-foo T 


T —>-oo 


T 


T-1 

Y Bfiiait)) 

t^O 

(5) 

T-1 

; ^p(q;(/)) < P 

(6) 


t=o 


a{t) € AWt G {0,1,2,...} such that F{t) = 1 

(7) 

Transition probabilities satisfy Q-Q (8) 


where /3 is a positive constant that determines the desired 
average power constraint. 


A. The memory less file size assumption 

The above model assumes that file completion success 
on slot t depends only on the transmission decision a{t), 
independent of history. This implicitly assumes that file length 
distributions have a memoryless property where the residual 
file length is independent of the amount already delivered. 
Further, it is assumed that if the controller selects a trans¬ 
mission rate that is larger than the residual bits in the file, 
the remaining portion of the transmission is padded with fill 
bits. This ensures error events provide no information about 
the residual file length beyond the already known 0/1 binary 
file state. Of course, error probability might be improved 
by removing padded bits. However, this affects only the last 
transmission of a file and has negligible impact when expected 
file size is large in comparison to the amount that can be 
transmitted in one slot. Note that padding is not needed in 
the special case when all transmissions send one fixed length 
packet. 

The memoryless property holds when each file i has inde¬ 
pendent length Bi that is exponentially distributed with mean 
length B bits, so that: 

Pr[Bi > a;] = for a; > 0 

For example, suppose the transmission rate r(t) (in units of 
bits/slot) and the transmission success probability q(t) are 
given by general functions of a{t): 

r{t) = r{a{t)) 

q{t) = q{a{t)) 

Then the file completion probability (j){a{t)) is the probability 
that the residual amount of bits in the file is less than or 
equal to r{t), and that the transmission of these residual bits 
is a success. By the memoryless property of the exponential 
distribution, the residual file length is distributed the same as 
the original file length. Thus: 

(j){a{t)) = q{a{t))Pr[Bi < f{a{t))] 

1 _ 

= / =e~^/^dx (9) 

Jo B 

Alternatively, history independence holds when each file 
i consists of a random number Zi of fixed length packets, 
where Zi is geometrically distributed with mean Z = 1/p. 


Assume each transmission sends exactly one packet, but dif¬ 
ferent power levels affect the transmission success probability 
q{t) = q{a{t)). Then: 

= p.q{a{t)) (10) 

The memoryless file length assumption allows the file state 
to be modeled by a simple binary-valued process F{t) G 
{0,1}. However, actual file sizes may not have an exponential 
or geometric distribution. One way to treat general distribu¬ 
tions is to approximate the file sizes as being memoryless 
by using a (j){a{t)) function defined by either (|^ or 
formed by matching the average file size B or average number 
of packets Z. The decisions a{t) are made according to the 
algorithm below, but the actual event outcomes that arise from 
these decisions are not memoryless. A simulation comparison 
of this approximation is provided in Section where it is 
shown to be remarkably accurate (see Fig. |7]). 

The algorithm in this section optimizes over the class of all 
algorithms that do not use residual file length information. 
This maintains low complexity by ensuring a user has a 
binary-valued Markov state F{t) G {0,1}. While a system 
controller might know the residual file length, incorporating 
this knowledge creates a Markov decision problem with an 
infinite number of states (one for each possible value of 
residual length) which significantly complicates the scenario. 


B. Lyapunov optimization 

This subsection develops an online algorithm for problem 
First, notice that file state “1” is recurrent under any 
decisions for a{t). Denote tk as the fc-th time when the system 
returns to state “1.” Define the renewal frame as the time 
period between tk and tk+i- Define ihs frame size: 

T[k] = tk+i - tk 

Notice that T[k] = 1 for any frame k in which the file does 
not complete its download. If the file is completed on frame k, 
then T[k] = 1 + Gk, where Gk is a geometric random variable 
with mean E [Gk] = 1/A. Each frame k involves only a single 
decision a{tk) that is made at the beginning of the frame. 
Thus, the total power used over the duration of frame k is; 

ifc + 1 —1 

p{a{t)) = p{a{tk)) (11) 

t — tk 

Using a technique similar to that proposed in Q, we treat the 
time average constraint in (|^ using a virtual queue Q[k] that 
is updated every frame k by: 

Q[kl]= ui&yi{Q[k]+p{a{tk)) - liT[k], 0} (12) 

with initial condition Q[0] = 0. The algorithm is then param¬ 
eterized by a constant U > 0 which affects a performance 
tradeoff. At the beginning of the fc-th renewal frame, the user 
observes virtual queue Q\k] and chooses a{tk) to maximize 
the following drift-plus-penalty (DPP) ratio El: 

VBfiaitk)) - Q[k]p{a{tk)) 

E[T[k]\a{tk)] ^ ’ 
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The numerator of the above ratio adds a “queue drift term” 
—Q[k]p{a{tk)) to the “cuiTent reward term” VB(j){a{tk))- 
The intuition is that it is desirable to have a large value of 
current reward, but it is also desirable to have a large drift 
(since this tends to decrease queue size). Creating a weighted 
sum of these two terms and dividing by the expected frame 
size gives a simple index. The next subsections show that, 
for the context of the cuiTent paper, this index leads to an 
algorithm that pushes throughput arbitrarily close to optimal 
(depending on the chosen V parameter) with a strong sample 
path guarantee on average power expenditure. 

The denominator in ( [T3] ) can easily be computed: 

E[T[k]\a{tk)] = l+^^^^ 

Thus, ( [T3] ) is equivalent to 

VB(l){aitk))-Q[k]p{a{tk)) 

max -;--- 7 r—;- (14) 

a{tk)GA 1 + (/)(a(4))/A 

Since there are only a finite number of elements in A, (O' 
is easily computed. This gives the following algorithm for the 
single-user case: 


Algorithm 1: 

• At each time the user observes virtual queue Q[k] 
and chooses a{tk) as the solution to (14i (where ties are 
broken arbitrarily). 

• The value Q[k-\-V\ is computed according to ( |T^ at the 
end of the k-\h frame. 


C. Average power constraints via queue bounds 

Lemma 1: If there is a constant C > 0 such that Q[k] < C 
for all k € {0, 1 , 2 ,.. .}, then: 

1 

limsup- Vp(Q;(f)) < /3 
T->-oo 4 

Proof: From ( |T^ , we know that for each frame k: 

Q[k + l] > Q[k] + p{alfk)) - T[k]fi 
ReaiTanging terms and using T[k] = tk+i — tk gives: 

p{a{tk)) < {tk+i - tk)P + Q[k + 1] - Q[k] 

Fix K > 0. Summing over k G {0,1, • • • ,K — 1} gives: 

K-l 

^p(a(tk)) < (Ik - to)/3 + Q[Br] - Q[0] 

k=0 

< tidd + C 


The sum power over the first K frames is the same as the sum 
up to time fif — 1, and so: 

— 1 

^ p{a{f)) < tx/d + C 

t=o 

Dividing by tx gives: 


1 

tx 


iiC — 1 

^ p{a(f)) < P + C/tx- 


Taking K ^ oo, then, 

^ iiC — 1 

limsup— p{a{t)) < P (15) 

iC^oo tx ^ 

Now for each positive integer T, let K{T) be the integer such 
that tx(T) < T < tx(T)+i- Since power is only used at the 
first slot of a frame, one has: 


T—1 

1 V- / . NN 1 


^K{T) ~ 


tx{T) 




t—O ^ ' t—O 

Taking a limsup as T —t oo and using © yields the result. 


The next lemma shows that the queue process under our 
proposed algorithm is deterministically bounded. Define: 


= min p(a) 

aG,4\{0} 

= max p(a) 

aG,4\{0} 


Assume that p™*" > 0. 

Lemma 2: If (5[0] = 0, then under our algorithm we have 
for all fc > 0: 


Proof: First, consider the case when < p_ From 

and the fact that T[k] > 1 for all k, it is clear the queue 
can never increase, and so Q[k] < (3[0] = 0 for all fc > 0. 

Next, consider the case when > p. We prove the 

assertion by induction on k. The result trivially holds for k = 
0. Suppose it holds at fc = ( for ( > 0, so that: 


Q[l] < 


VB 




P 


We are going to prove that the same holds for fc = l + l. There 
are two cases: 

1) Q[l] < In this case we have by 


Q[l +1] 


< 

< 


Q[l] + p" 
VB 

pmin 


-p 


P 


2 ) 


^ < Q[l] < ^ - P. In this case, if 

p{a{ti)) = 0 then the queue cannot increase, so: 


0[i + il<0[il<^+p”‘“-/3 

On the other hand, if p{a{ti)) > 0 then p{a{ti)) > p™*" 
and so the numerator in ( [T4l l satisfies: 

VBp{a{ti)) - Q[l]p{a{ti)) < VB-Q[ir 

< 0 


and so the maximizing ratio in ( [T4l l is negative. However, 
the maximizing ratio in ( [T4l l cannot be negative because 
the alternative choice a{ti) — 0 increases the ratio 
to 0. This contradiction implies that we cannot have 

p{a{ti)) > 0. 
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The above is a sample path result that only assumes 
parameters satisfy A > 0, B > 0, and 0 < < 1. Thus, 

the algorithm meets the average power constraint even if it 
uses incorrect values for these parameters. The next subsection 
provides a throughput optimality result when these parameters 
match the true system values. 


and noting that this alternative decision is independent of T-L\k\ 
gives: 


E [-FB(/>(a(4)) + Q[k]p{a{tu))\H[k]] 


< -Vpi* +Q[k]!3 

(18) 


D. Optimality over randomized algorithms 

Consider the following class of Lid. randomized algo¬ 
rithms: Let 9{a) be non-negative numbers defined for each 
a S Al, and suppose they satisfy ~ '^*(^) 

represent a policy that, every slot t for which F{t) = 1, 
chooses a*{t) G ,4 by independently selecting strategy a 
with probability 9{a). Then {p{a*{tk)),4>{c(*{tk))) are inde¬ 
pendent and identically distributed (i.i.d.) over frames k. Under 
this algorithm, it follows by the law of large numbers that the 
throughput and power expenditure satisfy (with probability 1): 


lim 


1 

f 


T-l 


lim — 
>-oo X* 


T-l 




BE[^{a*{tk))] 

1 + E[0(a*(4))] /A 

E[p{a*{tk))] 

1 -|-E[(;i(a*(4))] /A 


It can be shown that optimality of problem 0-<i can be 
achieved over this class 0 . Thus, there exists an i.i.d. ran¬ 
domized algorithm a*{t) that satisfies: 


BE[f{a*{tk))] 

1 + E[,^(a*(4))]/A ^ 

(16) 

E[p{a*{tk))] ^ 

1 -1- E [^(Q!*(ffc))] /A “ 

(17) 


where p* is the optimal throughput for the problem 0-([^. 


E. Key feature of the drift-plus-penalty ratio 

Define H[fc] as the system history up to frame k, which 
includes the actions taken - ,a[k— 1] frame lengths 

T[0], • • • ,T[k — 1], the busy period in each frame, the idle 
period in each frame, and the queue value Q[k] (since this is 
determined by the random events before frame k). Consider 
the algorithm that, on frame k, observes Q[k] and chooses 
a{tk) according to ( [T4l i. The following key feature of this 
algorithm can be shown (see 0 for related results): 

E [-VBf{a{tk)) + Q[k]p{a{tk))\n[k]] 

E[l + (/i(a(4))/A|H[fc]] 

^ E [-VBfia*{tk)) + Q[k]pia*{tk))\n[k]] 

E[l + f{a*itk))/\\n[k]] 

where a*{tk) is any (possibly randomized) alternative decision 
that is based only on T-Llk]. This is an intuitive property: 
By design, the algorithm in ( [l4| l observes Hlk] and then 
chooses a particular action a{tk) to minimize the ratio over all 
deterministic actions. Thus, as can be shown, it also minimizes 
the ratio over all potentially randomized actions. Using the 
(randomized) i.i.d. decision a*{tk) from (|T6]l-([T7]i in the above 


E Performance theorem 

Theorem 1: Th^roposed algorithm achieves the constraint 
limsupy_>go y ^ P yields throughput 

satisfying (with probability 1): 

T—1 

hminf XI B(/.(a(f)) > Ai* - ^ (19) 

T^oo 1 ^' V 

t=0 

where Co is a constant^ 

Proof: First, for any fixed V, Lemma implies that 
the queue is deterministically bounded. Thus, according to 
Lemma the proposed algorithm achieves the constraint 
limsup-j-_^oo T — P- The rest is devoted to 

proving the throughput guarantee ( [T^ . 

Define: 

L{Q[k]) = lQ[kr. 

We call this a Lyapunov function. Define a frame-based 
Lyapunov Drift as: 

A[k] = L{Q[k + 1]) - L{Q[k]) 

According to ( fTSl i we get 

Q[k + 1]2 < {Q[k] +p{a{tk)) - T[k]/3f . 


Thus: 


A[k] < 


ip{a{tk)) - T[k]l3f 
2 


+ Q[k]{p{a{tk)) -T[k]/3) 


Taking a conditional expectation of the above given Hlk] and 
recalling that includes the information Q\k] gives: 


E[A[k]\%[k]] < Co + Q[k]E[p{a{tk)) - PT[k]\B[k]] (20) 


where Co is a constant that satisfies the following for all 
possible histories 


{p{a{tk)) - T[k]l3f I 


<Co 


Such a constant Co exists because the power p{a{tk)) is de¬ 
terministically bounded, and the frame sizes T[k] are bounded 
in second moment regardless of history. 

Adding the “penalty” —E [VB(j}{a{tk))\'H[k]] to both sides 
of ( |20l i gives: 

E[A[k]-VBcl){a{tk))\n[k]] 

< Co + E [-VBf{a{tk)) + Q[k]ip{aitk)) - /3T[fc])|H[fc]] 
= Co - Q[k]PE[T[k]\H[k]] 

E[T[k]\n[k]]E[-VBfiaitk)) + Q[k]p{a{tk))\n[k^ 

+ E[T[k]\n[k]] 


^The constant Cq is independent of V and is given in the proof. 
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Expanding T[k] in the denominator of the last term gives: 

¥.[^[k]-VB^{a{tk))\H[k]] 

<Co- Q[k]l3E [T[k]\n[k]] + E [T[k]\H[k]] x 
E [-VB(l>iaitk)) + Q[k]p{aitk))\n[k]] 

E[l + 0(a(4))/A|H[fc]] 

Substituting ( [T8] l into the above expression gives: 

E [A[k] - VB^{aih))\n[k]] 

< Co-Q[k]/3E[T[k]\H[k]] 

+E[T[k]\n[k]]i-Vp* + PQ[k]) 

= Co-Vp*E[T[k]\n[k]] ( 21 ) 


Rean'anging gives: 

E [A[fc] + V{p*T[k] - Bcj,{a{tkmn[k]] < Co 


( 22 ) 


The above is a drift-plus-penalty expression. Because we 
already know the queue Q[k] is deterministically bounded, it 
follows that: 

E [A[/c]21 


E 

k=l 


fc2 


< OO 


This, together with p21 ), implies by the drift-plus-penalty 
result in Proposition 2 of uni that (with probability 1): 


K-l 


limsup ^ ^ [k.*T[k] - B(/){a(ffc))] < ^ 

K—¥oo ^ ^ r 


k—0 


Thus, for any e > 0 one has for all sufficiently large K: 

K-l 

^ , Ob . 

e 




k^O 


Rearranging implies that for all sufficiently large K: 

{Co/V + e) 




jk—0 


> p* — {Co/V -f e) 


where the hnal inequality holds because T\k\ > 1 for all k. 
Thus: 


lim inf 

K—¥CC> 


Y/k=o B(j){a{tu)) 


>/i* 


{Co/V + e) 


The above holds for all e > 0. Taking a limit as e —0 implies: 


lim inf 
K—¥00 


Y/k=o B(l){oi{tk)) 

Y.kCom 


> k- 


Co/V 


Notice that (j){a{t)) only changes at the boundary of each 
frame and remains 0 within the frame. Thus, we can replace 
the sum over frames A: by a sum over slots t. The desired 
result follows. ■ 

The theorem shows that throughput can be pushed within 
0{1/V) of the optimal value jj*, where V can be chosen as 
large as desired to ensure throughput is arbitrarily close to 
optimal. The tradeoff is a queue bound that grows linearly 
with V according to Lemma which affects the convergence 
time required for the constraints to be close to the desired time 
averages (as described in the proof of Lemma [^l. 



Fig. 1. A system with N users. The shaded node for each user n indicates 
the current file state F„ (t) of that user. There are 2^ different state vectors. 


III. Multi-user file downloading 

This section considers a multi-user hie downloading system 
that consists of N single-user subsystems. Each subsystem is 
similar to the single-user system described in the previous sec¬ 
tion. Specihcally, for the n-th user (where n G {1,..., A}): 

• The hie state process is F„(f) G {0,1}. 

• The transmission decision is a„(f) G An, where An is 
an abstract set of transmission options for user n. 

• The power expenditure on slot t is p„(a„(f)). 

• The success probability on a slot t for which F„(f) = 1 
is (j)n{ctn{t)), where </>„(•) is the function that describes 
hie completion probability for user n. 

• The idle period parameter is A„ > 0. 

• The average hie size is Bn bits. 

Assume that the random variables associated with different 
subsystems are mutually independent. The resulting Markov 
decision problem has 2^ states, as shown in Eig. The 
transition probabilities for each active user depends on which 
users are selected for transmission and on the corresponding 
transmission modes. This is a restless bandit system because 
there can also be transitions for non-selected users (specih¬ 
cally, it is possible to transition from inactive to active). 

To control the downloading process, there is a central server 
with only M threads (M < N), meaning that at most M jobs 
can be processed simultaneously. So at each time slot, the 
server has to make decisions selecting at most M out of N 
users to transmit a portion of their hies. These decisions are 
further restricted by a global time average power constraint. 
The goal is to maximize the aggregate throughput, which is 
dehned as 


^ T-l N 

liminf CnBn4>{an{t)) 

T-foo 1 ' 

f—0 n—1 

where ci, C 2 ,..., cjv are a collection of positive weights that 
can be used to prioritize users. Thus, this multi-user hie 
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downloading problem reduces to the following; 


T-l N 


Max: lim inf — n^n^n {,^n (0 ) 


n—1 
T-l N 


S.t.: lim sup — EE Pn{c^n{t)) < 


T^q 


n—1 


N 

J2l{an{t))<M VtG {0,1,2, •••} 

n—1 

Pr[F^{t + l) = l I F„(i)=0]=A„ 
Pr[Fn{t + 1) = 0 I Fn{t) = 1] = </>n(a„(i)) 


(23) 

(24) 

(25) 

(26) 
(27) 


where the constraints @-(l^ hold for all n G {1,..., A^} 
and t G {0,1,2,...}, and where /(•) is the indicator function 
defined as: 


f 0, if x = 0; 

{ 1, otherwise. 


A. Lyapunov indexing algorithm 

This section develops our indexing algorithm for the 
multi-user case using the single-user case as a stepping 
stone. The major difficulty is the instantaneous constraint 
J2n=i 5; M. Temporarily neglecting this constraint, 

we use Lyapunov optimization to deal with the time average 
power constraint first. 

We introduce a virtual queue Q{t), which is again 0 at f = 0. 
Instead of updating it on a frame basis, the server updates this 
queue every slot as follows: 


N 


Q{t -f 1) = max < Q{t) + ^ - /3,0 


(28) 


n—1 


Define JV{t) as the set of users beginning their renewal frames 
at time t, so that Fn{t) = 1 for all such users. In general, 
Af{t) is a subset of A/" = {1, 2, • • • , N}. Define |A/'(f)| as the 
number of users in the set 

At each time slot t, the server observes the queue state Q{t) 
and chooses (Q:i(t),..., ajv(f)) in a manner similar to the 
single-user case. Specifically, for each user n G JV{t) define: 


9n{0ln{t)) = 


1 /An 


(29) 


This is similar to the expression ( [T4l l used in the single-user 
optimization. Call gn{oin{t)) a reward. Now define an index 
for each subsystem n by: 


7 „(f) = max gn(.oin{t)) (30) 

ar.(t)6-4n 


which is the maximum possible reward one can get from the 
n-th subsystem at time slot t. Thus, it is natural to define the 
following myopic algorithm: Find the (at most) M subsystems 
in Af{t) with the greatest rewards, and serve these with their 
corresponding optimal a„(f) options in An that maximize 
gn{an{t)). 


Algorithm 2: 

• At each time slot t, the server observes virtual queue 
state Q{t) and computes the indices using (30i for all 
n G N{t). 

• Activate the min[M, |A/'(f)|] subsystems with greatest 
indices, using their corresponding actions a„(f) G An 
that maximize p„(a„(f)). 

• Update Q{t) according to (|28]l at the end of each slot t. 


B. Theoretical performance analysis 

In this subsection, we show that the above algorithm always 
satisfies the desired time average power constraint. Define: 


B' 


min Pn{otn) 
a„G.4„\{0} 

minp““ 

n 

max Pn{an) 
max Cn 

n 

max Bn 

n 


Assume that p™*" > 0. 

Lemma 3: Under the above Lyapunov indexing algorithm, 
the queue {(5(f)}^o deterministically bounded. Specifi¬ 
cally, we have for all t G (0,1,2,...}: 


Q{t) < max • 


Yffnax 

pinin 


N 




-13,0 


Proof: First, consider the case when T,n=iP^‘^^ — P- 
Since Q{0) = 0, it is clear from the updating rule ( |28] l that 
Q{t) will remain 0 for all t. 

Next, consider the case when Tin=i Pri°'^ > P- We prove 
the assertion by induction on t. The result trivially holds for 
t = 0. Suppose at f = f', we have: 


Q(f') < 


pinin 


N 

n—1 


We are going to prove that the same statement holds for t = 
t' -f 1. We further divide it into two cases: 

1) Q{t') < -■ In this case, since the queue 

increases by at most T^n=i Pri°'^ — /3 on one slot, we 
have: 


Q(t' -f 1) < 


Yff'^o-x Jg ' 

pinin 


N 

E^ 


-p 


2 ) 


< Qit') < + eLipt^ - p- 

In this case, since fnictnif')) < there is no possi¬ 
bility that VcnBnpnianiP)) > Q{t')pn{,an{t')) unless 
ctn{t') = 0. Thus, the Lyapunov indexing algorithm of 
minimizing ( |29l ) chooses an{t') = 0 for all n. Thus, all 
indices are 0. This implies that Q{t 'cannot increase, 
and we get Q(t' + 1) < -h En=i Pn°‘'' “ P- 
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Theorem 2: The proposed Lyapunov indexing algorithm 
achieves the constraint; 

T-l N 

X! '^Pnioinit)) < 13 

n—1 

Proof: Using Lemma [T] under the special case that each 
frame only occupies one slot, we get that if {(5(f)}^o 
deterministically bounded, then the time average constraint is 
satisfied. Then, according to Lemma we are done. ■ 


IV. Multi-user optimality in a special case 

In general, it is very difficult to prove optimality of the 
above multi-user algorithm. There are mainly two reasons. 
The hrst reason is that multiple users might renew themselves 
asynchronously, making it difficult to define a “renewal frame” 
for the whole system. Thus, the proof technique in Theorem 1 
is infeasible. The second reason is that, even without the time 
average constraint, the problem degenerates into a standard 
restless bandit problem where the optimality of indexing is 
not guaranteed. 

This section considers a special case of the multi-user hie 
downloading problem where the Lyapunov indexing algorithm 
is provably optimal. The special case has no time average 
power constraint. Further, for each user n G {l,...,iV}: 

• Each hie consists of a random number of hxed length 
packets with mean = l//r„. 

• The decision set An = {0,1}, where 0 stands for “idle” 
and 1 stands for “download.” If a„(f) = 1, then user n 
successfully downloads a single packet. 

• Idle time is geometrically distributed with mean 1/A„. 

• The special case fj,n = ^ — A„ is assumed. 

The assumption that the file length and idle time parame¬ 
ters and A„ satisfy /i„ = 1 — A„ is restrictive. However, 
when this assumption holds, there exists a certain queueing 
system which admits exactly the same Markov dynamics 


as the system considered here (described in Section |IV-A 
helow). More importantly, it allows us to implement the 
stochastic coupling idea to prove optimality. 

The goal is to maximize the sum throughput (in units of 
packets/slot), which is dehned as: 


lim inf — 
T-s-oo T 


T-l N 


J2'^Bn(/)ian{t)). 


( 31 ) 


t —0 n—1 


In this special case, the multi-user hie downloading problem 
reduces to the following: 


T-l N 


Max 


: lim inf — 

T ^ ^ 


^{t) 


n—1 


N 

S.t.; '^an{t)<M VfS {0,1,2,-••} 

n—1 

Unit) G {0,F„(f)} 

Pr[Fn{t+l) = l I Fn{t) = 0]=Xn 


(32) 

(33) 

(34) 

(35) 


Pr[Fn{t + 1) = 0 I Fnit) = 1] = a„(f)(l - A„) 



Download = 1) 



Fig. 2. Markovian dynamics of the n-th system. 


where the equality ( [36l l uses the fact that /i„ = 1 — A^. A 
picture that illustrates the Markov structure of constraints (|3^- 
( |^ is given in Fig. 

A. A system with N single-buffer queues 

The above model, with the assumption /j,„ = 1 — A„, is 
structurally equivalent to the following: Consider a system of 
N single-buffer queues, M servers, and independent Bernoulli 
packet arrivals with rates A„ to each queue n G {1,...,1V}. 
This considers packet arrivals rather than file arrivals, so 
there are no hie length variables and no parameters in this 
interpretation. Let A(f) = (Ai(f),..., Ajv(f)) be the binary¬ 
valued vector of packet arrivals on slot t, assumed to be i.i.d. 
over slots and independent in each coordinate. Assume all 
packets have the same size and each queue has a single buffer 
that can store just one packet. Let F„(f) be 1 if queue n has 
a packet at the beginning of slot t, and 0 else. Each server 
can transmit at most 1 packet per slot. Let Q;„(f) be 1 if 
queue n is served on slot t, and 0 else. An arrival An{t) 
occurs at the end of slot t and is accepted only if queue n 
is empty at the end of the slot (such as when it was served 
on that slot). Packets that are not accepted are dropped. The 
Markov dynamics are described by the same hgure as before, 
namely, Fig.|^ Further, the problem of maximizing throughput 
is given by the same equations ([32ll-([36ll. Thus, although the 
variables of the two problems have different interpretations, 
the problems are structurally equivalent. For simplicity of 
exposition, the remainder of this section uses this single-buffer 
queue interpretation. 


B. Optimality of the indexing algorithm 

Since there is no power constraint, for any U > 0 the 
Lyapunov indexing policy ( |30l l in Section III-A reduces to 
the following (using c„ = 1, Q{f) = 0); If there are fewer 
than M non-empty queues, serve all of them. Else, serve the 
M non-empty queues with the largest values of 7 „, where: 

1 
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Thus, the Lyapunov indexing algorithm in this context reduces 
to serving the (at most M) non-empty queues with the largest 
A„ values each time slot. For the remainder of this section, 
this is called the Max-A policy. The following theorem shows 
that Max-A is optimal in this context. 

Theorem 3: The Max-A policy is optimal for the problem 
@-@. In particular, under the single-buffer queue interpre¬ 
tation, it maximizes throughput over all policies that transmit 
on each slot t without knowledge of the arrival vector A{t). 

For the N single-buffer queue interpretation, the total 
throughput is equal to the raw arrival rate minus the 

packet drop rate. Intuitively, the reason Max-A is optimal is 
that it chooses to leave packets in the queues that are least 
likely to induce packet drops. An example comparison of the 
throughput gap between Max-A and Min-A policies is given 
in Appendix A. 

The proof of Theorem]^ is divided into two parts. The first 
part uses stochastic coupling techniques to prove that Max-A 
dominates all alternative work-conserving policies. A policy 
is work-conserving if it does not allow any server to be idle 
when it could be used to serve a non-empty queue. The second 
part of the proof shows that throughput cannot be increased 
by considering non-work-conserving policies. 

C. Preliminaries on stochastic coupling 

Consider two discrete time processes X = {X(f)}“Q and 
y = {L(f)}“Q. The notation X =st y means that X and 
y are stochastically equivalent, in that they are described by 
the same probability law. Formally, this means that their joint 
distributions are the same, so for all t G {0, 1 , 2 ,.. .} and all 

Fr[X(0)<zo,...,X(t)<Zt] 

= Fr[V(0)<zo,...,V(t)<Zt] 

The notation X <st y means that X is stochastically less than 
or equal to y, as defined by the following theorem. 

Theorem 4: (||4l) The following three statements are equiv¬ 
alent: 

1) A" <st 

2) Fr[g{X{Q),X{l),--- ,X{t)) > z] < Fr[g{Y{Q), 
y(l),--- ,Y{t)) > z] for all t G all z, and for 
all functions g : TZT —> TZ that are measurable and 
nondecreasing in all coordinates. 

3) There exist two stochastic processes X' and y' on a 
common probability space that satisfy X =st X', y =st 
y, and X'{t) < Y'{t) for every t G Z+. 

The following additional notation is used in the proof of 
Theorem [3 

• Arrival vector {A(f)}j^g, where A(f) = [Ai{t) 
A 2 {t) ■ ■ ■ Ajv(f)]. Each A„(t) is an independent binary 
random variable that takes 1 w.p. A„ and 0 w.p. 1 — A„. 

• Buffer state vector {F(f)}“Q, where F(f) = [Fijf) 
F 2 {t) ■ ■ ■ F/v(f)]. So Fn{t) = 1 if queue n has a packet 
at the beginning of slot t, and Fn{t) = 0 else. 

• Total packet process U = {lA(f)}“o’ where U(t) = 

represents the total number of packets in 


the system on slot t. Since each queue can hold at most 
one packet, we have 0 <U{t) < N for all slots t. 

D. Stochastic ordering of buffer state process 

The next lemma is the key to proving Theorem The 
lemma considers the multi-queue system with a fixed but 
arbitrary initial buffer state F(0). The arrival process A{t) 
is as defined above. Let ^ be the total packet process 
under the Max-A policy. Let be the corresponding process 
starting from the same initial state F(0) and having the same 
arrivals A(f), but with an arbitrary work-conserving policy tt. 

Lemma 4: The total packet processes and ^ satisfy: 

W <st (37) 

Proof: Without loss of generality, assume the queues 
are sorted so that A„ < A„+i, n = - ,A — 1. 

Define {F’^(<)}“q as the buffer state vector under policy 
TT. Define {F“““^(f)}“Q as the corresponding buffer states 
under the Max-A policy. By assumption the initial states satisfy 
F^(0) = F“”*^(0). Next, we construct a third process 
with a modified arrival vector process {A^(f)}“g and a 
corresponding buffer state vector {F^(f)}“Q (with the same 
initial state F^(0) = F'^(O)), which satisfies: 

1) is also generated from the Max-A policy. 

2) =si A Since the total packet process is com¬ 
pletely determined by the initial state, the scheduling 
policy, and the arrival process, it suffices to show that 
{A'^(f)}j^Q and {A(f)}^Q have the same probability 
law. 

3 ) u^{t) < u^{t) yt > 0 . 

Since the arrival process A{t) is i.i.d. over slots, in order to 
guarantee 2) and 3), it is sufficient to construct A^{t) coupled 
with A{t) for each t so that the following two properties hold 
for all t > 0: 

• The random variables A{t) and A^{t) have the same 
probability law. Specifically, both produce arrivals ac¬ 
cording to Bernoulli processes that are independent 
over queues and over time, with Pr[A„(f) = 1] = 
Fr[Ayt) = 1] = A„ for all n S {1,... ,N}. 

. For all j e {1,2, ••• ,N}, 

n—1 n—1 

The construction is based on an induction. 

At f = 0 we have F^(0) = F''(0). Thus, ( [38] l naturally 
holds for t = 0. Now fix t > 0 and assume ( |38| l holds 
for all slots up to time < = r. If r > 1, further assume the 
arrivals {A^(f)}{“g^ have been constructed to have the same 
probability law as {A(f)}{“J. Since arrivals on slot r occur 
at the end of slot r, the arrivals A^(t) must be constructed. 
We are going to show there exists an A^(t) that is coupled 
with A(r) so that it has the same probability law and it also 
ensures ( |38l ) holds for f = r -t- 1. 

Since arrivals occur after the transmitting action, we divide 
the analysis into two parts. First, we analyze the temporary 
buffer states after the transmitting action but before arrivals 
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occur. Then, we define arrivals A^(r) at the end of slot r to 
achieve the desired coupling. 

Define F'^(t) and F^(t) as the temporary bujfer states right 
after the transmitting action at slot t but before arrivals occur 
under policy tt and policy Max-A, respectively. Thus, for each 
queue n G {1,..., N}: 

FZ{t) = F-(r)-<(r) (39) 

Fn{r) = F^{T)-ai,{T) (40) 

where and a^ir) are the slot t decisions under policy 

TT and Max-A, respectively. Since ( |38l ) holds for j = TV on 
slot T, the total number of packets at the start of slot r under 
policy TT is less than or equal to that under Max-A. Since both 
policies TT and Max-A are work-conserving, it is impossible 
for policy tt to transmit more packets than Max-A during slot 
T. This implies: 

N N 

(41) 

n—1 n—1 

Indeed, if tt transmits the same number of packets as Max-A 
on slot r, then clearly holds. On the other hand, if tt 
transmits fewer packets than Max-A, it must transmit fewer 
than M packets (since M is the number of servers). In this 
case, the work-conserving nature of tt implies that all non¬ 
empty queues were served, so that = 0 for all n and 

€D again holds. We now claim the following holds: 

Lemma 5: 

j j 

E Pnir) < £ F„"(r) Vj e {1, 2,... , TV}. (42) 

n—1 n—1 

Proof: See Appendix B. ■ 

Now let and j^{l) be the subscript of T-th empty 

temporary buffer (with order starting from the first queue) 
corresponding to F’^(r) and F^(t), respectively. It follows 
from that the tt system on slot r has at least as many 
empty temporary buffer states as the Max-A policy, and: 

f{l)<j\l) VTg{1,2,... ,K{t)} (43) 


where K{t) < N is the the number of empty temporary buffer 
states under Max-A at time slot r. Since Ai < Xj if and only 
if i < j, (|4^ further implies that 


A,.(;)<A,A(q VTg {1,2,... ,A(r)}. (44) 

Now constract the arrival vector A^(r) for the system with 
the Max-A policy in the following way: 


^i"(/)(T) = 1 = 1 W.p. 1 




= W.p. 

= W.p. 


(45) 


1-A. 


1-A 
A 


10 . 


10 






1-A, 


(0 


(46) 


Notice that ( |46| ) uses valid probability distributions because of 
( |44l l. This establishes the slot t arrivals for the Max-A policy 
for all of its K{t) queues with empty temporary buffer states. 
The slot T arrivals for its queues with non-empty temporary 
buffers will be dropped and hence do not affect the queue 


states on slot t -f 1. Thus, we define arrivals Aj'(r) to be 
independent of all other quantities and to be Bernoulli with 
Pr[A^{T) = 1] = Xj for all j in the set: 

jG{l,2,...,A}\{/(l),...,/(A(r))} 


Now we verify that A(t) and A'^(r) have the same probability 
law. First condition on knowledge of K{t) and the particular 
and j^{l) values for I G {1,..., iV(T)}. All queues j 
with non-empty temporary buffer states on slot r under Max-A 
were defined to have arrivals Aj'(r) as independent Bernoulli 
variables with Pr[Aj'(r) = 1] = Xj. It remains to verify those 
queues within • • • ,j^(K(T))}. According to ( |46] l, for 

any queue j^(l) in set • • ,j^(A(t))}, it follows 


Pr 


^iAp)(T) - 0 


(1 - Aj,r(;)) 


1 - Aj-.p) 


1 - AjA(q 


and so Pr[Aj‘(r) = 1] = Xj for all j G {j^il)}^P ■ 
Further, mutual independence of implies mu¬ 

tual independence of {Aj\(^i'f{T)}^p. Finally, these quantities 
are conditionally independent of events before slot r, given 
knowledge of K{t) and the particular and j^{l) values 
for I G K(t)}. Thus, conditioned on this knowledge, 

A(t) and A^(r) have the same probability law. This holds 
for all possible values of the conditional knowledge K (r) and 
j^(l) and It follows that A(r) and A^(t) have the same 
(unconditioned) probability law. 

Finally, we show that the coupling relations ( |45] l and ( |46l ) 
produce such F''(r + 1) satisfying 


i J 

^F:(t + 1)<'^F^(t + 1), VjG{1,2,--- ,N}. (47) 

n—1 n—1 

According to and ( |46l l, 

Aj^(i)(t) < A^ap)(t), VT G {I,-- - 

thus, 

i i 

^ AjAr(,)(T) < ^A^A(i)(T), VT G {I,--- ,A(r)}. (48) 

i=l i=l 

Pick any j G {1, 2, • • • , N}. Let P be the number of empty 
temporary buffers within the first j queues under policy tt, i.e. 


Similarly define: 

Then, it follows: 

j 


F = max I 
P(l)<3 


r = max 1. 

P{1)<3 


Y^F:{t + 1) = (49) 


n—1 

3 


n—1 

3 


J2F^{r + l) = ( 50 ) 


We know that P > F. So there are two cases: 
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• If F — then from < |49| ): 


n—1 n—1 i—1 

n—1 i=l 

= j^Fnir + l) 


where the inequality follows from ( |42l l and from ( |48| ) 
with I = l^. Thus, \ holds. 

• > l^, then from 


£ F-(r + 1) = £ F-(r) + ^ 

n—1 n—1 i—1 

r 

j 

n—1 i—1 

n=l i—1 

= j2Fn{r + l). 


where the first inequality follows from the fact that 

r 

E A-wM < 

i=i^ + l 

= i2Pnir)-i2Fir), 

n—1 n—1 

and the second inequality follows from ( |48| ). 

Thus, ( [3^ holds for f = t + 1 and the induction step is done. 

■ 

Corollary 1: The Max-A policy maximizes throughput 
within the class of work-conserving policies. 

Proof: Let S'^{t) be the number of packets transmitted 
under any work-conserving policy tt on slot t, and let 5'“" ^ (f) 
be the corresponding process under policy Max-A. Lemma 
implies U'^{f) <st Then; 

E[5”"(f)] = E[min[C7’"(f),M]] 

< E[mm[C7“”'^(f),M]] 

= E[^““^"(f)] 


where the inequality follows from Theorem with the under¬ 
standing that p(C/(0),..., U{t)) = min[[/(f), M] is a function 
that is nondecreasing in all coordinates. ■ 


TABLE I 

Problem parameters 


User 

An 

f-^n 

0n(l) 

Cn 


1 

0.0028 

0.5380 

0.4842 

4.7527 

3.9504 

2 

0.4176 

0.5453 

0.4908 

2.0681 

3.7391 

3 

0.0888 

0.5044 

0.4540 

2.8656 

3.5753 

4 

0.3I8I 

0.6103 

0.5493 

2.4605 

2.1828 

5 

0.4I5I 

0.9839 

0.8855 

4.5554 

3.1982 

6 

0.2546 

0.5975 

0.5377 

3.9647 

3.5290 

7 

0.1705 

0.5517 

0.4966 

1.5159 

2.5226 

8 

0.2109 

0.7597 

0.6837 

3.6364 

2.5376 


E. Extending to non-work-conserving policies 

Corollary establishes optimality of Max-A over the class 
of all work-conserving policies. To complete the proof of 
Theorem it remains to show that throughput cannot be 
increased by allowing for non-work-conserving policies. It 
suffices to show that for any non-work-conserving policy, there 
exists a work-conserving policy that gets the same or better 
throughput. The proof is straightforward and we give only a 
proof sketch for brevity. Consider any non-work-conserving 
policy TT, and let Ff (t) be its buffer state process on slot t for 
each queue n. For the same initial buffer state and arrival 
process, define the work-conserving policy tt' as follows; 
Every slot t, policy tt' initially allocates the M servers to 
exactly the same queues as policy tt. However, if some of these 
queues are empty under policy tt' , it reallocates those servers 
to any non-empty queues that are not yet allocated servers (in 
keeping with the work-conserving property). Let Ff (t) be 
the buffer state process for queue n under policy tt' . It is not 
difficult to show that Ff{t) > Ff (t) for all queues n and 
all slots t. Therefore, on every slot t, the amount of blocked 
arrivals under policy tt is always greater than or equal to that 
under policy tt' . This implies the throughput under policy tt 
is less than or equal to that of policy tt'. 

V. Simulation experiments 

In this section, we demonstrate near optimality of the multi¬ 
user Lyapunov indexing algorithm by extensive simulations. 
In the first part, we simulate the case in which the file length 
distribution is geometric, and show that the suboptimality gap 
is extremely small. In the second part, we test the robustness 
of our algorithm for more general scenarios in which the 
file length distribution is not geometric. For simplicity, it is 
assumed throughout that all transmissions send a fixed sized 
packet, all files are an integer number of these packets, and 
that decisions Q!„(f) G An affect the success probability of 
the transmission as well as the power expenditure. 

A. Lyapunov indexing with geometric file length 

In the first simulation we use N = 8, M = 4 with action 
set An — {0,1} Vn; The settings are generated randomly and 
specified in Table I, and the constraint (3 = 5. 

The algorithm is run for 1 million slots in each trial and each 
point is the average of 100 trials. We compare the performance 
of our algorithm with the optimal randomized policy. The 
optimal policy is computed by constructing composite states 
(i.e. if there are three users where user 1 is at state 0, user 2 is 
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at state 1 and user 3 is at state 1, we view Oil as a composite 
state), and then reformulating this MDP into a linear program 
(see ifTSl ) with 5985 variables and 258 constraints. 

In Fig. we show that as our tradeoff parameter V gets 
larger, the objective value approaches the optimal value and 
achieves a near optimal performance. Fig. and Fig. show 
that V also affects the virtual queue size and the constraint gap. 
As V gets larger, the average virtual queue size becomes larger 
and the gap becomes smaller. We also plot the upper bound of 
queue size we derived from Lemma in Fig. demonstrating 
that the queue is bounded. In order to show that V is indeed a 
trade-off parameter affecting the convergence time, we plotted 
Fig. 0 It can be seen from the hgure that as V gets larger, 
the number of time slots needed for the running average to 
roughly converge to the optimal power expenditure becomes 
larger. 



Fig. 3. Throughput versus tradeoff parameter V 



Fig. 5. Average virtual queue backlog versus tradeoff parameter V. 



Fig. 6. Running average power consumption versus tradeoff parameter V. 



Fig. 4. The time average power consumption versus tradeoff parameter V. 


the following: 


, . \OBJ-OPT\ 

relative error =--——- 

OPT 


(51) 


where OBJ is the objective value after running 1 million 
slots of our algorithm and OPT is the optimal value. We 
hrst explore the system parameters by letting A„’s and ^„’s 
take random numbers within 0 and 1, letting c„ take random 
numbers within 1 and 5, choosing V = 70 and hxing 
the remaining parameters the same as the last experiment. 
We conduct 1000 Monte-Carlo experiments and calculate the 
average relative error, which is 0.00083. 

Next, we explore the control parameters by letting the p„(l) 
values take random numbers within 2 and 4, letting 
take random numbers between 0 and 1, choosing V = 70, 
and fixing the remaining parameters the same as the hrst 
simulation. The relative error is 0.00057. Both experiments 
show that the suboptimality gap is extremely small. 


In the second simulation, we explore the parameter space 
and demonstrate that in general the suboptimality gap of our 
algorithm is negligible. First, we dehne the relative error as 


B. Lyapunov indexing with non-memoryless file lengths 
In this part, we test the sensitivity of the algorithm to 
different hie length distributions. In particular, the uniform 




































REVISED PAPER FOR IEEE TRANSACTIONS ON NETWORKING 


13 


TABLE II 

Problem parameters under geometric, uniform and poisson 

DISTRIBUTION 


User 

f-^n 

Unif. 

intei'val 

Poiss. 

mean 


0n(l) 

Cn 

Pn(l) 

1 

1/3 

[1,5] 

3 

0.4955 

0.1832 

4.3261 

2.8763 

2 

1/2 

[1,3] 

2 

0.1181 

0.4187 

1.6827 

2.0549 

3 

1/2 

[1,3] 

2 

0.1298 

0.4491 

1.9483 

2.1469 

4 

1/7 

[1,13] 

7 

0.4660 

0.0984 

2.7495 

3.4472 

5 

1/4 

[1,7] 

4 

0.1661 

0.1742 

1.5535 

3.2801 

6 

1/3 

[1,5] 

3 

0.2124 

0.3101 

4.3151 

3.5648 

7 

1/2 

[1,3] 

2 

0.5295 

0.4980 

3.6701 

2.4680 

8 

1/5 

[1,9] 

5 

0.2228 

0.1971 

4.0185 

2.2984 

9 

1/4 

[1,7] 

4 

0.0332 

0.1986 

3.0411 

2.5747 


distribution and the Poisson distribution are implemented re¬ 
spectively, while our algorithm still treats them as a geometric 
distribution with the same mean. We then compare their 
throughputs with the geometric case. 

We use N = 9, M = 4: with action set An = {0,1} Vn. The 
settings are specified in Table II with constraint /3 = 5. Notice 
that for geometric and uniform distribution, the file lengths are 
taken to be integer values. The algorithm is run for 1 million 
slots in each trial and each point is the average of 100 trials. 

While the decisions are made using these values, the affect 
of these decisions incorporates the actual (non-memoryless) 
file sizes. Fig. shows the throughput-versus-F relation for 
the two non-memoryless cases and the memoryless case with 
matched means. The performance of all three is similar. This 
illustrates that the indexing algorithm is robust under different 
file length distributions. 



Fig. 7. Throughput versus tradeoff parameter V under different file length 
distributions. 


VI. Conclusions 

We have investigated a file downloading system where the 
network delays affect the file arrival processes. The single- 
user case was solved by a variable frame length Lyapunov 
optimization method. The technique was extended as a well- 
reasoned heuristic algorithm for the multi-user case. Such 
heuristics are important because the problem is a multi¬ 
dimensional Markov decision problem with very high com¬ 


plexity. The heuristic is simple, can be implemented in an on¬ 
line fashion, and was analytically shown to achieve the desired 
average power constraint. Moreover, under a special case with 
no average power constraint, stochastic coupling was used to 
prove the heuristic is throughput optimal. Simulations suggest 
that the algorithm is in general very close to optimal. Further, 
simulations suggest that non-memoryless file lengths can be 
accurately approximated by the algorithm. These methods can 
likely be applied in more general situations of restless multi¬ 
armed bandit problems with constraints. 


Appendix A—Comparison of Max-A and Min-A 


This appendix shows that different work conserving policies 
can give different throughput for the N single-buffer queue 
problem of Section |IV-A Suppose we have two single¬ 
buffer queues and one server. Let Ai,A 2 be the arrival 
rates of the i.i.d. Bernoulli arrival processes for queues 
1 and 2. Assume Ai ^ A 2 . There are 4 system states: 
(0,0), (0,1), (1,0), (1,1), where state (i, j) means queue 1 
has i packets and queue 2 has j packets. Consider the (work 
conserving) policy of giving queue 1 strict priority over queue 
2. This is equivalent to the Max-A policy when Ai > A 2 , and 
is equivalent to the Min-A policy when Ai < A 2 . Let 0(Ai, A 2 ) 
be the steady state throughput. Then: 


^(Ai, A 2 ) = Pi,o +Po,i +Fi,i 


where pij is the steady state probability of the resulting 
discrete time Markov chain. One can solve the global balance 
equations to show that 0(1/2,1/4) > 0(1/4,1/2), so that the 
Max-A policy has a higher throughput than the Min-A policy. 
In particular, it can be shown that: 

• Max-A throughput: 0(1/2,1/4) = 0.7 

• Min-A throughput: 0(1/4,1/2) « 0.6786 


Appendix B—Proof of LemmaO 
This section proves that: 

j j 

^F:{r)<^F^{r) Vj G {1, 2, • • • , W}. (52) 

n—1 n—1 

The case / = W is already established from ( |4T] i. Fix j G 
{1,2,. ..,W—1}. Since tt cannot transmit more packets than 
Max-A during slot r, inequality ( |5^ is proved by considering 
two cases: 

1) Policy TT transmits less packets than policy Max-A. Then 
TT transmits less than M packets during slot r. The work- 
conserving nature of tt implies all non-empty queues 
were served, so F^ (r) = 0 for all n and ( |5^ holds. 

2) Policy TT transmits the same number of packets as 

policy Max-A. In this case, consider the temporary buffer 
states of the last N — j queues under policy Max-A. If 
J2n=j+i ~ 0’ '^hen clearly the following holds 

N N 

E Pn{r)> Y. ^nir)- (53) 

n=j + l n=j + l 

Subtracting ( |5^ from ( |4T] l immediately gives ( |52l l. If 
'l2n=j+i > 0’ then all M servers of the Max-A 
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system were devoted to serving the largest An queues. 
So only packets in the last N — j queues could be 
transmitted by Max-A during the slot r. In particular, 
a^ir) = 0 for all n G {1,... ,j}, and so (by (|40li): 

= (54) 

n—1 n—1 

Thus; 

(55) 

n—1 n—1 

<i2^nir) (56) 

n—1 

3 

= J2F^iT), (57) 

n—1 

where ( |55] l holds by ( |39] ), ( [56] l holds because ( [38] l is true 
on slot t = T, and the last equality holds by ( |54l i. This 
proves (|5^. 
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